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ABSTRACT 



Data Models are an essential part of automatic data pro- 
cessing, but even more so when trying to tie together data 
coming from many different data sources, as is the case 
for the International Virtual Observatory. In this talk we 
will review the different data models used in the IVOA, 
which parts of that Data Modelling work are still incom- 
plete, specially in radio wavelengths, and the work the 
AMIGA group has done within the IVOA Data Mod- 
elling Working Group to overcome those shortcomings 
both in missing data models and support for Radio As- 
tronomy. 

Key words: Virtual Observatory; Data Modelling; Radio 
Astronomy. 



1. INTRODUCTION 



The AMIGA project (Analysing the interstellar Medium 
of Isolated GAlaxies) was born in 2003, and intends to 
provide a statistical characterisation of a strictly selected 
sample of isolated galaxies composed by more than 1000 
objects, by means of multi-wavelength data, and with a 
particular emphasis on radio data at cm, mm, and sub- 
mm wavelengths. All these data are being periodically 
released via the web page of the projecfl, which provides 
a Virtual Observatory (VO) ConeSearch interface. 

AMIGA+ is the natural extension to AMIGA, with three 
different goals: exploitation of the AMIGA catalogue, se- 
lecting the best candidates for a detailed study of isolated 
galaxies; scientific extension to the millimetre and sub- 
millimetre range; and participation in the development of 
systems allowing the access and display of large radio as- 
tronomical databases, both single-dish and interferomet- 
ric, within the VO framework. 

During the AMIGA+ projects two VO-compliant radio 
astronomical archives were developed: the IRAM 30m 



http : //amiga .iaa. csic.es/ 



antenna archive (soon to be published), and the DSS-63 
archive. 

Early during the development phase, a decision was made 
that we would not just provide a VO compatibility layer 
on top of these archives' infrastructures, but that the inter- 
nal archive organisation should reflect existing VO data 
models in order to assure that the VO interface would be 
able to provide the most metadata. 



2. DATA MODELS 



Data models are the detailed description of the set of en- 
tities needed for information storage in a particular field, 
and specify both the data being stored, and the relation- 
ships between them. Data models are part of the hidden 
VO infrastructure astronomers would normally never be 
involved with, but knowledge of data models can enhance 
the opportunities for the exploitation of the VO. 

Within the VO, data models apply not only directly to 
the scientific data, but to the metadata describing them. 
As the way to structure information depends on the ap- 
plication domain, VO data models describe astronomical 
datasets in a way that is as instrument independent as pos- 
sible, to ensure that the same description can be used for 
data with different provenance. Users must also be able to 
query those data models to be able to find datasets which 
comply with certain properties. 

The IVOA Data Modelling Working Group (DMWG) 
started an an effort to provide a complete data model 
for astronom ical observations, the Data Model for Ob- 
servations ( McDowell et al.L 12005*). One of the most im- 
portant parts of it was the Characterisation of datasets, 
that is, the complete specification of where those datasets 
could be found in the spatial, temporal, and spectral axes, 
with more axes avalaible (i.e., polarisation) for suitable 
datasets. The Observation data model was put on hold, 
and the D ata Model for Astronom ical Dataset Charac- 
terisation (iMcDowell et al.L l2007l) was started. 

We have built a complete observation data model 
for single-dish radio telescopes, the Radio Astro- 
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nomical DAta Mo del for Sin g le-dish telescopes 
(RADAMS) ( Santan der-Vela et all l2QQ8h . based on 
those two documents. 



3. DATA MODEL ELEMENTS 



When defining a VO data model, we have to specify: 

Entities Being the data model building blocks, they 
group related attributes within a data model. They 
can be mapped to Classes in Object-Oriented Pro- 
gramming (OOP), or Elements in XML. 

Fields They are the actual data elements of the model. 
They map to Attributes in OOP, and they can be 
mapped to Attributes or to Elements without chil- 
dren in XML. 

Relationships The different entities and fields have hi- 
erarchical or relational relationships: an observation 
projects has projected observations, and all entities 
which share a common project ID are related, for 
instance. For the data model to be uniquely defined 
those relationships must be made explicit. 

Data types For computers to be able to correctly inter- 
pret a data stream a Data type needs to be specified. 
For instance, object IDs could be Integers, but they 
are normally textual, so String data must be used. 
We could consider the restrictions which can be de- 
fined for complex data types in XML as part of the 
data typing. 

Units No physical quantity can be specified without pro- 
viding its units. Physical-data related Fields need 
Units to be specified, or Units have to be a fixed 
property of certain Fields, but they either need to 
exist as an implicit attribute of a particular field, or 
to have their own dedicated Field. 

Semantics As observation metadata are related to real- 
world elements and quantities, VO data models 
should specify semantics — i.e., what is exactly 
meant in the real world by a particular field — to 
avoid ambiguities. Most of VO semantics are pro- 
vided via Unified Content Descriptors (UCDs) and 
UTypes. 



4. SEMANTICS, UCDS, UTYPES AND IVOA VO- 
CABULARIES 

UCDs are a controlled vocabular}0, under the supervi- 
sion of the IVOA Semantics WG, which provides a list of 
atoms which can be used to identify fields as correspond- 
ing to specific astronomical quantities. For instance, a 



field containing the Right Ascension can be identified by 
the UCD atom pos . eq. ra, while a photometric flux 
in the V band can be identified by the two UCD atoms 
phot . flux; em. opt .V. This provides both a uni- 
fied vocabulary to identify any astrophysical quantity, and 
an automatic knowledge discovery tool for fields with ar- 
bitrary relationships. In fact, UC Ds were born out of a 
joint CDS/ESO data mining effort dOrtiz et al.L[T999h . 

However, UCDs can only provide data kind information, 
but not relationship information. In a sense, they are a 
kind of specialised unit, complementary — orthogonal — 
to physical units: in the same way that quantities with 
the same physical units can be very different in nature 
(i.e., decay time for an isotope versus oscillation period), 
fields with identical UCDs can also be related to differ- 
ent real- word phenomena. In order to allow such deeper 
relationships to be expressed, and disambiguate metadata 
fields UTypes were born. 

UTypes are created from a hierarchical data model 
by enumerating the different parents a particular field 
has in that hierarchy. For instance, a field con- 
taining the Right Ascension in equatorial coordinates 
for where an instrument was pointed to corresponds 
to the spatial coverage characterisation, in particular 
to the Location property, and thus it would sport a 
UType of characterisat ion . coverage . spa- 
tial . location, the UCD would be pos.eq.ra, 
and its units could be any angular unit. 

But even with the help of units, UCDs and UTypes, some- 
times it can be difficult to tag a particular piece of data 
with meaningful semantics, specially for data which does 
not have a direct place in a VO data model. For that we 
can borrow techniques from the Semantic Web (an effort 
for providing web documents with semantics, so that, for 
instance, a table of camera prices can be tagged so that 
software tools can identify in it prices, if possible be- 
longing to digital cameras, even to particular brands), and 
provide one or more standardised astronomical vocabu- 
laries. The IVOA Semantics WG0 has started recreating 
controlled vocabularies such as UCDs in Semantic Web 
form, an d even the I AU thesa urus has been recreated in 
that way (iDerriere et al l l2008h . We are using them in the 
IRAM 30m archive in order to provide semantics to data 
coming from antenna engineering terms. 



5. ROLE OF DATA MODELS IN THE VO 



We can identify in the VO four different phases, and we 
can see that in all of them data models play a central role: 

Discovery Datasets available in the VO have to be 
discoverable for them to appear automatically in 
VO tools. The VO Registry holds data for exist- 
ing datasets so that they can be easily discovered. 



^http : //www. ivoa . net /Document s/ latest / 
UCDlist .html 



^http : //www. ivoa . net /cgi -bin/ twiki /bin /view/ 
IVOA/ IvoaSemantics 
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Figure 1. RADAMS structure: is a combination of the Observation and Characterisation data models, which have been 
completely specified for single dish radio telescopes. Data models which are completely new in their definition for any 
kind of instrument are marked in gray. 



The data model for Space-Time Coordinates (STC), 
dataset Characterisation (CharDM), Resource meta- 
data (ResDM), and the UCD and IVOA thesaurus 
(IVOAT) are relevant in this phase. 

Evaluation Datasets have to be evaluated in order to as- 
sess their applicability to the kind of analysis we 
might wish to perform; for instance, in order to do 
image mosaicing we need a certain coordinate over- 
lap, and in order to do image stacking we need an al- 
most complete overlap, and comparable resolutions. 
The main data model involved in this phase is the 
CharDM. 

Data Access There is an implicit data model in the 
IVOA data access protocols, the Data Access Layer, 
which is centred on targets (coordinates with toler- 
ances/search radii), and uses several properties from 
the CharDM, such as the Coverage in several axes. 

Transformation When creating a new dataset, or trans- 
forming an existing one, a new CharDM instance 
needs to be created. If the transformed data set is 
a spectrum, the Spectral data model (SpecDM) is 
needed both for obtaining the complete description 
of the original data and describing the transformed 
product. There is no existing data model yet for im- 
ages or for more complex data within the VO. In 
addition, in order to trace the origin of the trans- 
formed image we would need to use a Provenance 



data model, that apart from being an integral part of 
the Observation data model (ObsDM), it should be 
built in a stand-alone form so that it can be applied 
to newly generated, non-observational data. 



6. RADAMS STRUCTURE AND PROPERTIES 



After having presented the importance of VO data models 
for the different activities involved in VO data queries, 
analyses and transformations, we will present the main 
RADAMS features. 

Figure [T] shows the RADAMS main structure, which can 
be seen as a combination of the Observation data model 
and the CharDM data model, but fully specifying classes 
that were only laid out by the ObsDM. Some particular 
adaptations of the CharDM to radio astronomy, specifi- 
cally for the Sensitivity class, have been performed, and 
the Target class is able to deal with radio catalogues. 

However, RADAMS main contributions come from the 
ObsDM classes which have been completely specified: 

Packaging We have developed a VOPack packaging 
standard, which embeds characterisation informa- 
tion for any packaged observation set, and can spec- 
ify the recursive inclusion of other packages. 
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Policy The Policy class that we have developed is able to 
accommodate different user roles determined from 
the user identification and the explicit or implicit 
policies for each particular dataset. 

Provenance Initially specified only for radio astronomy 
data, we are working to make it more general, and to 
be able to use it outside of observational scopes, so 
that software tools which provide dataset transfor- 
mations can document the origin of their processed 
datasets. 



The RADAMS has been implemented in two archives: 
the Telescope Access for Public Archive System 
(TAPAS) for the IRAM 30m, to be announced in the 
February 2009 IRAM Newsletter, and the scientific 
archive of NASA's Deeps Space Center in Madrid 70 m 
antenna, DSS-63, and has shown that VO data models can 
be used as a blueprint for archives which are being built 
from scratch for VO compatibility. 



7. FUTURE WORK 



In the future, we plan to have the RADAMS Provenance 
data model, part of the author's thesis (to be published 
in 2009), contributed to the IVOA DMWG, and have it 
integrated with the ObsDM. 

We are starting also a collaboration with the ALMA Sci- 
entific Archive team in order to create in the future both a 
data model for ALMA data cubes which can be integrated 
in a future IVOA data model for multidimensional data, 
and providing VO compatibility to the ALMA Science 
Archive. 



8. CONCLUSIONS 



Data models are one of the three bases interoperability 
relies on. As the data models for astrophysical obser- 
vations' metadata must be interoperable across different 
software packages and instrument domains, not every de- 
tail can be included in the data models, or can be de- 
scribed using IVOA's UCDs and UTypes. Semantic web 
technologies, such as thesauri expressed in W3C stan- 
dards, can be used to provide additional semantics. 

However, the IVOA high-level data modelling efforts are 
complete enough as to have guided the development of 
the RADAMS data model. The RADAMS has been suc- 
cessfully used to create two operational radio astronom- 
ical archives for two very different antennas and instru- 
ment systems, such as the IRAM 30m and the DSS-63. 

In the future, the ObsDM must be complete enough to 
support the complex datasets to be produced by ALMA 
and radio telescopes further ahead in time, such as the 
LOFAR and other SKA pathfinders. 
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